298 research outputs found

    Work-in-Progress: Quantized NNs as the Definitive solution for inference on low-power ARM MCUs?

    Get PDF
    High energy efficiency and low memory footprint are the key requirements for the deployment of deep learning based analytics on low-power microcontrollers. Here we present work-in-progress results with Q-bit Quantized Neural Networks (QNNs) deployed on a commercial Cortex-M7 class microcontroller by means of an extension to the ARM CMSIS-NN library. We show that i) for Q=4 and Q=2 low memory footprint QNNs can be deployed with an energy overhead of 30% and 36% respectively against the 8-bit CMSIS-NN due to the lack of quantization support in the ISA; ii) for Q=1 native instructions can be used, yielding an energy and latency reduction of 3c3.8 7 with respect to CMSIS-NN. Our initial results suggest that a small set of QNN-related specialized instructions could improve performance by as much as 7.5 7 for Q=4, 13.6 7 for Q=2 and 6.5 7 for binary NNs

    Leveraging Automated Mixed-Low-Precision Quantization for Tiny Edge Microcontrollers

    Get PDF
    The severe on-chip memory limitations are currently preventing the deployment of the most accurate Deep Neural Network (DNN) models on tiny MicroController Units (MCUs), even if leveraging an effective 8-bit quantization scheme. To tackle this issue, in this paper we present an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors, under the tight constraints on RAM and FLASH embedded memory sizes. We conduct an experimental analysis on MobileNetV1, MobileNetV2 and MNasNet models for Imagenet classification. Concerning the quantization policy search, the RL agent selects quantization policies that maximize the memory utilization. Given an MCU-class memory bound of 2 MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions quantized with a non-uniform function, which is not tailored for CPUs featuring integer-only arithmetic. This denotes the viability of uniform quantization, required for MCU deployments, for deep weights compression. When also limiting the activation memory budget to 512 kB, the best MobileNetV1 model scores up to 68.4% on Imagenet thanks to the found quantization policy, resulting to be 4% more accurate than the other 8-bit networks fitting the same memory constraints

    Decadal Variability in the Northeast Pacific in a Physical-Ecosystem Model: Role of Mixed Layer Depth and Trophic Interactions

    Get PDF
    A basin-wide interdecadal change in both the physical state and the ecology of the North Pacific occurred near the end of 1976. Here we use a physical-ecosystem model to examine whether changes in the physical environment associated with the 1976-1977 transition influenced the lower trophic levels of the food web and if so by what means. The physical component is an ocean general circulation model, while the biological component contains 10 compartments: two phytoplankton, two zooplankton, two detritus pools, nitrate, ammonium, silicate, and carbon dioxide. The model is forced with observed atmospheric fields during 1960-1999. During spring, there is a similar to 40% reduction in plankton biomass in all four plankton groups during 1977-1988 relative to 1970-1976 in the central Gulf of Alaska (GOA). The epoch difference in plankton appears to be controlled by the mixed layer depth. Enhanced Ekman pumping after 1976 caused the halocline to shoal, and thus the mixed layer depth, which extends to the top of the halocline in late winter, did not penetrate as deep in the central GOA. As a result, more phytoplankton remained in the euphotic zone, and phytoplankton biomass began to increase earlier in the year after the 1976 transition. Zooplankton biomass also increased, but then grazing pressure led to a strong decrease in phytoplankton by April followed by a drop in zooplankton by May: Essentially, the mean seasonal cycle of plankton biomass was shifted earlier in the year. As the seasonal cycle progressed, the difference in plankton concentrations between epochs reversed sign again, leading to slightly greater zooplankton biomass during summer in the later epoch

    Mixed-data-model heterogeneous compilation and OpenMP offloading

    Get PDF
    Heterogeneous computers combine a general-purpose host processor with domain-specific programmable many-core accelerators, uniting high versatility with high performance and energy efficiency. While the host manages ever-more application memory, accelerators are designed to work mainly on their local memory. This difference in addressed memory leads to a discrepancy between the optimal address width of the host and the accelerator. Today 64-bit host processors are commonplace, but few accelerators exceed 32-bit addressable local memory, a difference expected to increase with 128-bit hosts in the exascale era. Managing this discrepancy requires support for multiple data models in heterogeneous compilers. So far, compiler support for multiple data models has not been explored, which hampers the programmability of such systems and inhibits their adoption. In this work, we perform the first exploration of the feasibility and performance of implementing a mixed-data-mode heterogeneous system. To support this, we present and evaluate the first mixed-data-model compiler, supporting arbitrary address widths on host and accelerator. To hide the inherent complexity and to enable high programmer productivity, we implement transparent offloading on top of OpenMP. The proposed compiler techniques are implemented in LLVM and evaluated on a 64+32-bit heterogeneous SoC. Results on benchmarks from the PolyBench-ACC suite show that memory can be transparently shared between host and accelerator at overheads below 0.7 % compared to 32-bit-only execution, enabling mixed-data-model computers to execute at near-native performance

    Growth variations and scattering mechanisms in metamorphic In0.75Ga0.25As/In-0.75 Al0.25As quantum wells grown by molecular beam epitaxy

    Get PDF
    Modulation doped metamorphic In0.75Ga0.25As/In0.75Al0.25As quantum wells (QW) were grown on GaAs substrates by molecular beam epitaxy (MBE) with step-graded buffer layers. The electron mobility of the QWs has been improved by varying the MBE growth conditions, including substrate temperature, arsenic over pressure and modulation doping level. By applying a bias voltage to SiO2 insulated gates, the electron density in the QW can be tuned from 1×1011 to 5.3×1011 cm−2. A peak mobility of 4.3×105 cm2V−1s−1 is obtained at 3.7×1011 cm−2 at 1.5 K before the onset of second subband population. To understand the evolution of mobility, transport data is fitted to a model that takes into account scattering from background impurities, modulation doping, alloy disorder and interface roughness. According to the fits, scattering from background impurities is dominant while that from alloy disorder becomes more significant at high carrier density

    Low-frequency variability in the Gulf of Alaska from coarse and eddy-permitting ocean models

    Get PDF
    [1] An eddy-permitting ocean model of the northeast Pacific is used to examine the ocean adjustment to changing wind forcing in the Gulf of Alaska (GOA) at interannual-to-decadal timescales. It is found that the adjustment of the ocean model in the presence of mesoscale eddies is similar to that obtained with coarse-resolution models. Local Ekman pumping plays a key role in forcing pycnocline depth variability and, to a lesser degree, sea surface height (SSH) variability in the center of the Alaska gyre and in some areas of the eastern and northern GOA. Westward Rossby wave propagation is evident in the SSH field along some latitudes but is less noticeable in the pycnocline depth field. Differences between SSH and pycnocline depth are also found when considering their relationship with the local forcing and leading modes of climate variability in the northeast Pacific. In the central GOA pycnocline depth variations are more clearly related to changes in the local Ekman pumping than SSH. While SSH is marginally correlated with both Pacific Decadal Oscillation (PDO) and North Pacific Gyre Oscillation (NPGO) indices, the pycnocline depth evolution is primarily related to NPGO variability. The intensity of the mesoscale eddy field increases with increasing circulation strength. The eddy field is generally more energetic after the 1976–1977 climate regime shift, when the gyre circulation intensified. In the western basin, where eddies primarily originate from intrinsic instabilities of the flow, variations in eddy kinetic energy are statistically significant correlated with the PDO index, indicating that eddy statistics may be inferred, to some degree, from the characteristics of the large-scale flow

    Multi-Color Imaging of Magnetic Co/Pt Multilayers

    Get PDF
    We demonstrate for the first time the realization of a spatial resolved two color, element-specific imaging experiment at the free-electron laser facility FERMI. Coherent imaging using Fourier transform holography was used to achieve direct real space access to the nanometer length scale of magnetic domains of Co/Pt heterostructures via the element-specific magnetic dichroism in the extreme ultraviolet spectral range. As a first step to implement this technique for studies of ultrafast phenomena we present the spatially resolved response of magnetic domains upon femtosecond laser excitation

    Two-dimensional electron gas formation in undoped In[0.75]Ga[0.25]As/In[0.75]Al[0.25]As quantum wells

    Full text link
    We report on the achievement of a two-dimensional electron gas in completely undoped In[0.75]Al[0.25]As/In[0.75]Ga[0.25]As metamorphic quantum wells. Using these structures we were able to reduce the carrier density, with respect to reported values in similar modulation-doped structures. We found experimentally that the electronic charge in the quantum well is likely due to a deep-level donor state in the In[0.75]Al[0.25]As barrier band gap, whose energy lies within the In[0.75]Ga[0.25]As/In[0.75]Al[0.25]As conduction band discontinuity. This result is further confirmed through a Poisson-Schroedinger simulation of the two-dimensional electron gas structure.Comment: 17 pages, 6 figures, to be published in J. Vac. Sci. Technol.

    Seeded x-ray free-electron laser generating radiation with laser statistical properties

    Full text link
    The invention of optical lasers led to a revolution in the field of optics and even to the creation of completely new fields of research such as quantum optics. The reason was their unique statistical and coherence properties. The newly emerging, short-wavelength free-electron lasers (FELs) are sources of very bright coherent extreme-ultraviolet (XUV) and x-ray radiation with pulse durations on the order of femtoseconds, and are presently considered to be laser sources at these energies. Most existing FELs are highly spatially coherent but in spite of their name, they behave statistically as chaotic sources. Here, we demonstrate experimentally, by combining Hanbury Brown and Twiss (HBT) interferometry with spectral measurements that the seeded XUV FERMI FEL-2 source does indeed behave statistically as a laser. The first steps have been taken towards exploiting the first-order coherence of FELs, and the present work opens the way to quantum optics experiments that strongly rely on high-order statistical properties of the radiation.Comment: 24 pages, 10 figures, 37 reference

    Neuraghe: Exploiting CPU-FPGA synergies for efficient and flexible CNN inference acceleration on zynQ SoCs

    Get PDF
    Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition. However, their computational load is significant, motivating the development of CNN-specialized accelerators. This work presents NEURAghe, a flexible and efficient hardware/software solution for the acceleration of CNNs on Zynq SoCs. NEURAghe leverages the synergistic usage of Zynq ARM cores and of a powerful and flexible Convolution-Specific Processor deployed on the reconfigurable logic. The Convolution-Specific Processor embeds both a convolution engine and a programmable soft core, releasing the ARM processors from most of the supervision duties and allowing the accelerator to be controlled by software at an ultra-fine granularity. This methodology opens the way for cooperative heterogeneous computing: While the accelerator takes care of the bulk of the CNN workload, the ARM cores can seamlessly execute hard-to-accelerate parts of the computational graph, taking advantage of the NEON vector engines to further speed up computation. Through the companion NeuDNN SW stack, NEURAghe supports end-to-end CNN-based classification with a peak performance of 169GOps/s and an energy efficiency of 17GOps/W. Thanks to our heterogeneous computing model, our platform improves upon the state-of-the-art, achieving a frame rate of 5.5 frames per second (fps) on the end-to-end execution of VGG-16 and 6.6fps on ResNet-18
    corecore